Automatic capture and efficient storage of e-Science experiment provenance

نویسندگان

  • Roger S. Barga
  • Luciano A. Digiampietri
چکیده

For the First Provenance Challenge, we introduce a layered model to represent workflow provenance that allows navigation from an abstract model of the experiment to instance data collected during a specific experiment run. We outline modest extensions to a commercial workflow engine so it will automatically capture provenance at workflow runtime. We also present an approach to store this provenance data in a relational database. Finally, we demonstrate how core provenance queries in the challenge can be expressed in SQL and discuss the merits of our layered representation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Generation of Workflow Provenance

While workflow is playing an increasingly important role in eScience, current systems lack support for the collection of provenance data. We argue that workflow provenance data should be automatically generated by the enactment engine and managed over time by an underlying storage service. We briefly describe our layered model for workflow execution provenance, which allows navigation from the ...

متن کامل

Model of Karma Version 3

Provenance that captures e-Science activity has long term value only if the right amount and kind of information is collected. In this paper, we propose a two-layer model for representing provenance information capable of representing both execution information and higher level process details. The information model forms the basis for efficient relational database storage and query, and sets t...

متن کامل

A user-orientated approach to provenance capture and representation for in silico experiments, explored within the atmospheric chemistry community.

We present a novel user-orientated approach to provenance capture and representation for in silico experiments, contrasted against the more systems-orientated approaches that have been typical within the e-Science domain. In our approach, we seek to capture the scientist's reasoning in the form of annotations as an experiment evolves, while using the scientist's terminology in the representatio...

متن کامل

A platform for all that we know: creating a knowledge-driven research infrastructure

C omputer systems have become a vital part of the modern research environment, supporting all aspects of the research lifecycle [1]. The community uses the terms " eScience " and " eResearch " to highlight the important role of computer technology in the ways we undertake research, collaborate, share data and documents, submit funding applications , use devices to automatically and accurately c...

متن کامل

Sheer Curation of Experiments: Data, Process, Provenance

This paper describes an environment for the “sheer curation” of the experimental data of a group of researchers in the fields of biophysics and structural biology. The approach involves embedding data capture and interpretation within researchers' working practices, so that it is automatic and invisible to the researcher. The environment does not capture just the individual datasets generated b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Concurrency and Computation: Practice and Experience

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2008